home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
JCSM Shareware Collection 1993 November
/
JCSM Shareware Collection - 1993-11.iso
/
cl760
/
stat1j.lzh
/
SHZIPARC.EXE
/
WATSTAT.TXT
< prev
next >
Wrap
Text File
|
1991-12-26
|
72KB
|
1,061 lines
*DISCLAIM,A
IMPORTANT:
Always consider WATSTAT's recommendations as a STARTING POINT and NOT
THE FINAL WORD: they are merely intended to serve as guides to further study
and consultation. WATSTAT can only recommend what is USUALLY appropriate,
given the specifications you provide. Other unspecified factors my over-
ride those that WATSTAT considers. Moreover, it would be unwise to ignore
such "non-statistical" factors as: what procedures make the most theoretical
sense; what procedures are established and expected in your field; and what
procedures you and your readers will be able to interpret.
*RAND,A
NOTE: Since you specified Random Sampling or Random Assignment, it is
legitimate to use INFERENTIAL STATISTICS (Significance Tests & Confidence
Limits) as well as DESCRIPTIVE STATISTICS. But when you use Inferential
statistics, you must still report important Descriptive statistics, such as
means & standard deviations, percentages, or correlation coefficients.
*NONRAND,A
NOTE: Since you have a non-random sample, NO INFERENTIAL STATISTICS
(such as Significance Tests or Confidence limits) are appropriate. Hence,
WATSTAT will recommend only DESCRIPTIVE STATISTICS.
*WHAT_DES,A
Report all Descriptive statistics needed to characterize your sample
(e.g., demographics) and, depending upon your analytical focus, report those
that most clearly show: 1) the magnitude of sub-sample differences; 2) the
strength & direction of associations; or 3) the characteristics of a single
variable's distribution, e.g., its "average," "dispersion," and "shape."
In deciding what Descriptive statistics to report, ask yourself: "What
information will a reader need to REPLICATE my analysis or to COMPARE my
results to those of others?"
*D-UNI-NOM,A
Summarize the distribution with a percentage table and point out the
Modal and sparse categories. Optionally, present percentages graphically
in a bar or pie chart.
*D-NOM-SMALL,A
CAUTION: Due to your small sample size, each case counts for more than 1%
and a seemingly large between-category % difference could be due to very few
cases. Take this into account in deciding whether percentage differences
reflect important substantive differences in the cases you're describing.
*D-UNI-RANK,A
If your data are inherently in the form of ranks, sample size determines
all the key descriptive statistics and there is no need to report them. You
should report the number of ties and the ranks on which most ties occur.
If you have an Ordinal variable (not originally in ranks) the Median is
the appropriate "average" and the Quartile Deviation the appropriate index
of "dispersion." Usually, it is also appropriate to report some additional
Percentiles to give a more complete picture of the variable's distribution,
for example, the 25th & 75th Percentiles, or the upper and lower Deciles.
*D-UNI-PART,A
If your Ordinal categories allow, compute the Median and Quartile Devia-
tion to index the "average" and "degree of dispersion," respectively. If
data are inherently grouped and if it is inappropriate to compute the Median
exactly, report the category it falls in and its approximate location in the
category. Summarize the distribution with a percentage table and point out
the Modal and sparse categories. Optionally, present percentages graphically
in a bar or pie chart.
*D-UNI-INT,A
If your data are dichotomized, report the cut-point that divides the
categories and the percentage (or proportion) of cases in each category.
If your data are continuous or grouped into 3 or more categories, use the
Mean and Standard Deviation to index the "average" and "dispersion" of the
distribution. If the distribution is highly skewed or if there are some
extreme values that could make the Mean a "misleading average," report the
Median instead of, or in addition to, the Mean. Whether or not the data are
skewed, it is usually wise to report some key Percentiles to provide a more
complete picture of the distribution, for example, the 25th & 75th Percent-
iles, or the upper and lower Deciles.
If the data are grouped, a Percentage Table or equivalent graphic (e.g.,
a bar chart) is usually appropriate. If you don't use a percentage table
with grouped data, consider reporting where the Mode falls and which, if
any, categories are exceptionally sparse.
If the data are continuous and if it is important to describe the shape
of the distribution, consider grouping the data and using procedures noted
in the preceding paragraph. Alternatively, you could present the data in a
Frequency Polygon (line chart) or in an Ogive (a line chart that shows the
cumulative frequency distribution).
*D-COMP1-NOM,A
Percentage tables are usually the best for comparing Nominal distribu-
tions across sub-samples. Use Percentage Differences to index the magnitude
of sub-sample differences, and point out the Modal and sparse categories for
each sub-sample. Optionally, present percentages graphically in bar charts.
*D-COMP2-NOM,A
Percentage tables are usually the best for comparing Nominal distribu-
tions across sub-samples. Use Percentage Differences to index the magnitude
of sub-sample differences, and point out the Modal and sparse categories for
each sub-sample. Multivariate percentage tables are appropriate for showing
differences across two or more Independent (Comparison) variables, especial-
ly when there are important Interaction (Specification) effects. However,
such tables are more difficult to read, so it is usually advisable to break
them into a set of bivariate Partial Tables. Standardized Percentage Tables
can be used to adjust for one or more Comparison variables without showing
them directly in the tables, but standardization can only be used for Com-
parison variables that do not Interact with others. As an alternative to
tables, consider presenting percentages graphically in bar charts.
*D-COMP-RANK,A
If your Dependent variable is inherently in the form of ranks, your best
option is probably to compare Mean Ranks across sub-samples. However, keep
in mind that Mean Ranks are not the same as means computed on Interval data,
so the absolute size of sub-sample differences is not meaningful: focus only
on "greater-than" and "less-than" relationships between Mean Ranks of your
sub-samples. Unless ties are rare, report the number of ties and the ranks
on which most ties occur.
If your Ordinal Dependent variable is not ranked, the Median is the
appropriate "average" and the Quartile Deviation the appropriate index of
"dispersion." Compare Medians across sub-samples, and search for possible
"interaction effects" between Comparison variables. Focus on the RELATIVE
SIZE of sub-sample Medians (i.e., "greater-than" & "less-than" relations),
because the absolute magnitude of Ordinal-scale Medians is not meaningful.
Usually, it is also appropriate to report some additional Percentiles (e.g.,
the 25th & 75th Percentiles or the highest & lowest Deciles) to give a more
complete picture of each sub-sample distribution.
*D-COMP-PART,A
The best way to assess differences on a "Partially Ordered" variable
depends on whether you're able to compute sub-sample Medians.
If your data allow you to determine Medians exactly, report the Medians
for all sub-samples and focus on the RELATIVE SIZE of sub-sample Medians
(i.e., "greater-than" & "less-than" relations), since the absolute magnitude
of Ordinal-scale Medians is not meaningful. If you have two or more Compar-
ison Variables, search for possible "interactions" between these variables.
If the grouping of data doesn't allow you to compute Medians, you won't
be able to compare sub-sample "averages" in a way that takes full advantage
of the Dependent variable's Ordinal properties. The best approach in this
case is to present the data in Percentage Tables, which assume only Nominal
measurement. (Optionally, present percentages graphically in bar charts.)
Use % Differences to index the magnitude of sub-sample differences and point
out the Modal and sparse categories for each sub-sample. Since you should
be able to specify the CATEGORIES THAT CONTAIN THE MEDIAN for the various
sub-samples, you can also base comparisons on the APPROXIMATE location of
Medians; since categories are ordered, you should also be able to interpret
an approximate difference in Medians as evidence that one sub-sample has a
higher "average" than another.
*D-COMP1-INT,A
With Interval Dependent Variables it is usually appropriate to base
sub-sample comparisons on Means. Report all sub-sample Means and Standard
Deviations.
*D-COMP2-INT,A
If you have two or more Comparison Variables, search for possible inter-
actions. If you have one or more Interval-Level Independent variables that
you wish to control ("hold constant"), Analysis of Covariance procedures can
be used to adjust sub-sample Means for such variables.
*D-COMP-DICH,A
Percentage tables are usually best for comparing Dichotomous Dependent
variables across sub-samples, but it may be appropriate to use Rates or
Proportions rather than %'s, especially if the Dependent variable represents
a relatively rare occurrence, such as a disease or mortality outcome. [Note
that Rates & Proportions may be analyzed and tabulated in much the same way
as Percentages, although they are expressed on different scales.]
Use % Differences [or Rate or Proportion Differences] to index the magni-
tude of sub-sample differences, and point out the Modal and sparse catego-
ries for the various sub-samples. Multivariate tables are appropriate for
showing differences across two or more Independent (Comparison) variables,
especially when important Interaction (Specification) effects are present.
However, such tables are more difficult to read, so it may be advisable to
break them into a set of bivariate Partial Tables. "Standardized Partial
Percentage Tables" can be used to adjust for one or more Independent vari-
ables without showing them directly in the tables, but standardization can
only be used for Independent variables that do not Interact with others.
Instead of tables, consider presenting Percentages [or Rates or Proportions]
in graphic charts.
*D-COMP-OTHER2,A
Except for Interval Dependent Variables, there is no procedure designed
to handle simultaneous sub-sample comparisons for 2 or more Dependent vari-
ables. Your only option is to run a separate analysis for each Dependent
variable. To get recommendations appropriate for these separate analyses,
return to WATSTAT's Choice Boxes and select an Option other than "2 or More
Dependent Variables" in Box 4.
*D-BIVAR-NOM/NOM,A
If the two Nominal variables are dichotomized, use the Phi Coefficient
as a measure of association. If either or both of your Nominal variables
has 3 or more categories, use Cramer's V, which is the same as Phi except
that it adjusts for the number categories.
*D-BIVAR-NOM/RANK,A
There is no statistic specifically designed to measure the association
between a Nominal Dependent variable and an Ordinal Independent variable.
Your only choice is to break the Ordinal variable into categories and treat
it as Nominal. If you dichotomize it, select a cut-point as close to the
Median as possible; if you break it into 3 or more categories, select cut-
points that yield approximately equal frequencies across categories. Once
the Ordinal variable is categorized, the appropriate statistics are those
for two Nominal variables.
If the two Nominal variables are dichotomized, use the Phi Coefficient
as a measure of association. If either or both of your Nominal variables
has 3 or more categories, use Cramer's V, which is the same as Phi except
that it adjusts for the number categories.
*D-BIVAR-NOM/PART,A
There is no statistic specifically designed to measure the association
between a Nominal Dependent variable and an Independent variable that is
cast in the form of Ordinal categories. Your only choice is to treat the
Ordinal variable as if it were a set of Nominal categories, and the only
appropriate statistics are those for two Nominal variables.
If the two Nominal variables are dichotomized, use the Phi Coefficient
as a measure of association. If either or both of your Nominal variables
has 3 or more categories, use Cramer's V, which is the same as Phi except
that it adjusts for the number categories.
*D-BIVAR-NOM/INT,A
There is no statistic specifically designed to measure the association
between a Nominal Dependent variable and an Interval Independent variable,
so you have two OPTIONS: 1) break the Interval variable into categories and
treat it as Nominal, or 2) dichotomize the Dependent variable and treat it
as Interval.
If you choose OPTION 1, break the Independent variable into categories
that contain approximately equal numbers of cases. Once this is done, the
appropriate statistics are those for two Nominal variables.
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
If you choose OPTION 2, dichotomize the Dependent variable as close as
possible to the Median unless there is theoretical justification for using
another "high vs. low" cut-point. The dichotomized Dependent variable may
now be assigned arbitrary scores of 0 for "low" and 1 for "high" and may,
within limits, be treated as an Interval scale. Once this is done, you can
use the Linear Correlation Coefficient (Pearson's r and r-squared) to index
the strength and direction of the relationship. But if your problem calls
for regression statistics, Linear Regression may not be appropriate: with a
dichotomous Dependent variable some predicted (Y') scores may have impossi-
ble values (less than 0 or greater than 1). If these impossible values are
numerous or if they will cause problems in interpreting your results, use
Logistic Regression instead.
*D-BIVAR-RANK/NOM,A
There is no statistic specifically designed to measure the association
between an Ordinal Dependent variable and a Nominal Independent variable.
Your only choice is to break the Ordinal variable into categories and treat
it as Nominal. If you dichotomize it, select a cut-point as close to the
Median as possible; if you break it into 3 or more categories, select cut-
points that yield approximately equal frequencies across categories. Once
the Ordinal variable is categorized, the appropriate statistics are those
for two Nominal variables.
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
*D-BIVAR-RANK/RANK,A
If both variables are in the form of ranks, you can proceed to compute one
of the measures of association noted below. Otherwise, you must transform
them to ranks before proceeding.
Spearman's Rho is the best known measure of association for two Ordinal
variables and, because it is simply the Linear Correlation Coefficient
(Pearson's r) applied to ranks, it is often interpreted as an approximate
index of linear correlation. The "correction for ties" should be applied
to Rho, but it has little effect if fewer than 30% of the cases are tied.
In some fields the preferred statistic is Kendall's Tau, which, unlike
Spearman's Rho, does not involve any arithmetical operations that assume
an underlying Interval Scale. This statistic is sometimes referred to as
"Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are
applied to "ordered contingency tables." The computing formulas for Tau-A
found in most texts incorporate a correction for tied ranks.
*D-BIVAR-RANK/PART,A
There is no statistic specifically designed to measure the association
between a "true" Ordinal Dependent variable and a "partially ordered" ind-
ependent variable. Your best choice is to break the Dependent variable into
ordered categories and treat both variables as "partially ordered." Prior
to computations, copy the data into a contingency table in which rows are
categories of the Dependent variable and columns are categories of the
Independent variable. Use one of the following measures of association:
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
*D-BIVAR-RANK/INT,A
There is no statistic specifically designed to measure the association
between an Ordinal Dependent variable and an Interval Independent variable.
If you can't assume that the Dependent variable is Interval, you'll have to
"downgrade" the Independent variable and treat it as an Ordinal scale. If
you can transform it to ranks, do so, and apply one of the measures of
association recommended below. [If it is so grouped that it can only be
transformed into a set of ordered categories, go back thru WATSTAT's Choice
Boxes and pick Option 3, "Ordered Categories," as the Level of Measurement
for the Independent variable.]
Spearman's Rho is the best known measure of association for two Ordinal
variables and, because it is simply the Linear Correlation Coefficient
(Pearson's r) applied to ranks, it is often interpreted as an approximate
index of linear correlation. The "correction for ties" should be applied to
Rho, but it has little effect if fewer than 30% of the cases are tied.
In some fields the preferred statistic is Kendall's Tau, which, unlike
Spearman's Rho, does not involve any arithmetical operations that assume
an underlying Interval Scale. This statistic is sometimes referred to as
"Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are
applied to "ordered contingency tables." The computing formulas for Tau-A
found in most texts incorporate a correction for tied ranks.
*D-BIVAR-PART/NOM,A
There is no statistic specifically designed to measure the association
between a set of ordered categories and a Nominal Independent variable, and
your only option is to "downgrade" the Dependent variable to the Nominal
level. For two Nominal variables the following recommendations apply.
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
*D-BIVAR-PART/RANK,A
There is no statistic specifically designed to measure the association
between a "partially ordered" Dependent variable and a "true" Ordinal ind-
ependent variable. Your best choice is to break the Independent variable
into ordered categories and treat both variables as "partially ordered."
Prior to computations, copy the data into a contingency table in which rows
are categories of the Dependent variable and columns are categories of the
Independent variable. Use one of the following measures of association:
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
*D-BIVAR-PART/PART,A
Prior to computations, copy the data into a contingency table in which
rows are categories of the Dependent variable and columns are categories of
the Independent variable. Use one of the following measures of association:
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
*D-BIVAR-PART/INT,A
There is no statistic specifically designed to measure the association
between a "partially ordered" Dependent variable and an Interval Independent
variable. The best alternative is to break the Independent variable into
ordered categories and treat both variables as "partially ordered." Prior
to your computations, copy the data into a contingency table in which rows
are categories of the Dependent variable and columns are categories of the
Independent variable. Then use one of the following indices of association:
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
*D-BIVAR-INT/NOM,A
The preferred measure of association for an Interval Dependent variable
and a Nominal Independent variable is the Correlation Ratio (Eta). The Eta
statistic indexes the strength of a relationship of any form, including
non-monotonic (e.g., U-shaped). Eta-Squared is commonly reported instead of
Eta, since it has a more meaningful interpretation: it measures the propor-
tion of variance in the Dependent variable explained by the categories of
the Independent variable.
*D-BIVAR-INT/RANK,A
There is no statistic specifically designed to measure the association
between an Interval Dependent variable and an Ordinal Independent variable.
If you can't assume that Independent variable is Interval, you'll have to
"downgrade" the Dependent variable and treat it as an Ordinal scale. If
you can transform it to ranks, do so, and apply one of the measures of
association recommended below. [If it is so grouped that it can only be
transformed into a set of ordered categories, go back thru WATSTAT's Choice
Boxes and pick Option 3, "Ordered Categories," as the Level of Measurement
for the Dependent variable.]
Spearman's Rho is the best known measure of association for two Ordinal
variables and, because it is simply the Linear Correlation Coefficient
(Pearson's r) applied to ranks, it is often interpreted as an approximate
index of linear correlation. The "correction for ties" should be applied
to Rho, but it has little effect if fewer than 30% of the cases are tied.
In some fields the preferred statistic is Kendall's Tau, which, unlike
Spearman's Rho, does not involve any arithmetical operations that assume
an underlying Interval Scale. This statistic is sometimes referred to as
"Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are
applied to "ordered contingency tables." The computing formulas for Tau-A
found in most texts incorporate a correction for tied ranks.
*D-BIVAR-INT/PART,A
There is no statistic specifically designed to measure the association
between an Interval Dependent variable and a "partially ordered" Independent
variable, so you have 2 OPTIONS: 1) "downgrade" the Dependent variable by
breaking it into ordered categories, or 2) "downgrade" the Independent vari-
able to a Nominal scale. OPTION 2 is the best choice if you're interested
mainly in the strength of the relationship, but since the Independent vari-
able is assumed to be merely Nominal, you won't be unable to determine the
direction (+/-) of the relationship.
If you choose OPTION 1, you should break the Dependent variable into cat-
egories that contain approximately equal numbers of cases. Copy the data
into a contingency table in which rows are categories of the Dependent vari-
able and columns are categories of the Independent variable. Then compute
one of the following indices recommended for ordered contingency tables.
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
If you choose OPTION 2, every category of the Independent variable MUST
contain at least 2 cases (preferably more), so you might have to collapse
some sparse categories. However, categories should not be collapsed without
restraint: it is also desirable to have as many categories as possible.
The preferred measure of association for an Interval Dependent variable
and a Nominal Independent variable is the Correlation Ratio (Eta). The Eta
statistic indexes the strength of a relationship of any form, including
non-monotonic (e.g., U-shaped). The square of the Eta (Eta-Squared) is
commonly reported instead of Eta, since it has a more meaningful interpret-
ation: it measures the proportion of variance in the Dependent variable
explained by the categories of the Independent variable.
*D-BIVAR-INT/INT,A
In most situations the preferred index of association for two Interval
variables is the Linear Correlation Coefficient, also called Pearson's r.
The square of the r statistic, known as the Coefficient of Determination, is
often reported along with r, because it measures the proportion of variance
in one variable explained by the other.
If you're interested in predicting or estimating scores on the Dependent
variable from those on the Independent variable, you should compute the
Linear Regression statistics: the Regression Coefficient, the Y-Intercept,
and the Standard Error of Estimate.
If you suspect that the relationship departs markedly from linearity, so
that Pearson's r underestimates its "true" strength, you can use the Correl-
ation Ratio (Eta) instead. This will require breaking the Independent vari-
able into a set of categories, preferably in such a way that 5 or more cases
fall in each category. Eta indexes the strength of a relationship of any
form, including those which are non-monotonic (e.g., U-shaped). Eta-squared
is commonly reported instead of Eta, because it has a more meaningful inter-
pretation: it measures the proportion of variance in the Dependent variable
explained by the categories of the Independent variable.
*D-BIVAR-DICH/NOM,A
Even if your dichotomous Dependent variable is Ordinal or Interval, it is
probably best to treat it as Nominal, like your Independent variable, and
use a measure of association for two Nominal variables.
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
*D-BIVAR-DICH/RANK,A
There is no statistic specifically designed to measure the association
between a dichotomous Dependent variable and an Ordinal Independent vari-
able. You'll first have to break the Independent variable into categories
and then you'll have 2 OPTIONS: 1) assume the Dependent variable is Ordinal
and use a measure of association for two "partially ordered" variables, or
2) assume that both variables are merely Nominal and use a measure for two
Nominal variables. Option 1 is usually preferable, but choose Option 2 if
it makes no sense to treat the dichotomous Dependent variable as Ordinal.
If you choose Option 1, copy the data into an ordered contingency table
and compute one of the following:
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
If you choose Option 2, copy the data into a contingency table, making no
assumption about the order of rows & columns. Then use one of the following
measures appropriate for two Nominal scales:
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
*D-BIVAR-DICH/PART,A
With a dichotomous Dependent variable and a "partially ordered" independ-
ent variable, you have 2 OPTIONS: 1) assume the Dependent variable is also
Ordinal and use a measure of association for two "partially ordered" vari-
ables, or 2) assume the Independent variable is only Nominal and use a meas-
ure of association for two Nominal variables. Option 1 is usually better.
If you choose Option 1, copy the data into an ordered contingency table
and compute one of the following:
The best statistic for most ordered contingency tables is a modified form
of Kendall's Tau: use Tau-B if the number of rows in the table equals the
number of columns; use Tau-C if the table is not "square."
If you choose Option 2, copy the data into a contingency table, making no
assumption about the order of rows & columns. Then use one of the following
measures appropriate for two Nominal scales:
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
*D-BIVAR-DICH/INT,A
With a dichotomous Dependent variable and an Interval Independent vari-
able, you have 2 OPTIONS: 1) assume that the dichotomy is an Interval vari-
able, or 2) "downgrade" the Independent variable to the Nominal level. For
Option 1, which is usually preferable, you'd use a measure of association
for two Interval variables. For Option 2, you'd first break the Independent
variable into categories and use a measure of association for two Nominal
variables.
If you choose OPTION 1, assign arbitrary scores of 0 (low) and 1 (high)
to categories of the Dependent variable. Then use the Linear Correlation
Coefficient (Pearson's r and r-squared) to measure the strength and direc-
tion (+/-) of the relationship. If you're mainly interested in predicting
Dependent variable scores from those on the Independent variable, compute
regression statistics (Regression Coefficient, Y-Intercept, & Standard Error
of Estimate). But note that Linear Regression may not be appropriate: with
a dichotomous Dependent variable, some scores predicted from the regression
equation (Y'= A+bx) may have impossible values (i.e., less than 0 or greater
than 1). If there are many impossible values or if they will cause problems
in interpreting your results, use Logistic Regression instead.
If you take OPTION 2, divide the Independent variable into categories
that contain about the same number of cases and use one of the following:
If the two Nominal variables are dichotomized, use the Phi Coefficient as
a measure of association. If either or both of your Nominal variables has
3 or more categories, use Cramer's V, which is the same as Phi except that
it adjusts for the number categories.
*D-MUL-SMALL-INT,A
WARNING: The SAMPLE SIZE you specified may be TOO SMALL to support the type
of multivariate procedure(s) WATSTAT recommended. As a practical rule of
thumb you should have a minimum of about 10 cases for each variable in such
procedures. To meet this criterion you may have to drop some variables from
the analysis. If you can't drop enough to approach the 10-case-per-variable
criterion, you shouldn't use the above procedure(s).
*D-MUL-SMALL-NOM,A
WARNING: The SAMPLE SIZE you specified may be TOO SMALL to use Multivariate
Procedures for Nominal Variables, of the sort recommended. Computations for
such methods are based on cross-tabulations, and as the number of variables
(& categories) increases, cell frequencies can become too sparse to support
the analysis. You may need to drop some variables from the analysis and/or
collapse variables into fewer categories.
*D-MUL-1DEP-NOM/NOM,A
The recommended procedure (and the only one available) for measuring the
association between a Nominal-level Dependent and a set of Nominal independ-
ent variables is Log-Linear Analysis. In most cases, this procedure will
require the use of a computer and many popular statistical software packages
can run it. A good deal of statistical sophistication is required to apply
it and to interpret its results. Log-Linear Analysis may not be widely used
in your field and, if not, the task of reporting your results will be some-
what more difficult. The use of Log-linear Analysis is also limited by the
substantial sample size it usually requires.
However, no alternative procedure is applicable unless you're willing to
dichotomize the Dependent variable (so it can be scored 0/1 and treated as
Interval) and to transform all the Independent variables and also treat them
as Interval. The latter step would involve either: 1) dichotomizing each
Independent variable and assigning "0" & "1" scores to its categories; or
2) creating a set of "dummy variables" (each scored 0/1) to represent its
categories. After these transformations, you can apply either Logistic
Regression or Discriminant Analysis. For more info about these procedures,
return to WATSTAT's Choice Boxes and specify "Dichotomous" for the depen-
dent (Box 5) variable & "Interval" for the Independent (Box 6) variables.
*D-MUL-1DEP-NOM/INT,A
The only procedure designed to assess the association between a Nominal
Dependent & a set of Interval Independent variables is Discriminant Analysis.
This procedure does not produce a single index (analogous to a correlation
coefficient), but instead yields a set of prediction equations, called
"Discriminant Functions," the interpretation of which requires a good deal
of statistical expertise. Computations must be done by computer and most
statistical software packages include Discriminant Analysis routines.
Interpretation of results is considerably simpler if the Dependent vari-
able is dichotomized, but if this is done, Logistic Regression and Multiple
Correlation/Regression would also be applicable and perhaps preferable.
*D-MUL-1DEP-NOM/MIXIO,A
There is no procedure available to measure association between a Nominal
Dependent variable and Independent variables with "mixed" levels of measure-
ment, so you'll need to transform one or more Independent variables to make
them all either Nominal or Interval. In the former case, you'd simply break
your Interval or Ordinal variables into categories and proceed as if they
were Nominal. In the latter, you'd transform each Ordinal or Nominal inde-
pendent variable to Interval by either: 1) dichotomizing it and assigning
scores of "0" and "1" to its categories; or 2) breaking it into categories
and creating a set of "dummy variables" (each scored 0/1) to represent its
categories.
If all Independent variables are Nominal, Log-Linear Analysis may be
used. For more info about Log-Linear Analysis, return to WATSTAT's Choice
Boxes and specify "Nominal" measurement for both the Dependent (Box 5) and
the Independent (Box 6) variables.
If all Independent variables are Interval (including dichotomies and
dummy variables), you can use Discriminant Analysis. For more info about
Discriminant Analysis, return to WATSTAT's Choice Boxes and specify
"Nominal" for the Dependent (Box 5) and "Interval" for the Independent
(Box 6) variables.
*D-MUL-1DEP-NOM/ORD,A
There is no procedure available to measure association between a Nominal
Dependent variable and Ordinal Independent variables. Your best alternative
is to categorize the Ordinal variables and treat them as Nominal; then you
can use Log-Linear Analysis. For more information on Log-Linear Analysis,
return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both
the Dependent (Box 5) and the Independent (Box 6) variables.
*D-MUL-1DEP-ORD/ALL,A
There is no multivariate procedure designed to measure the association
between an Ordinal Dependent variable and a set of 2 or more Independent
variables. However, if you transform the Dependent variable (and perhaps
the Independent variables) a number of alternatives may be applicable.
You have 2 basic OPTIONS: 1) dichotomize the Dependent variable and treat
it as Interval, or 2) break the Dependent variable into 2 or more categories
and treat it as Nominal. OPTION 1 is preferable as long as it makes sense
to dichotomize the Dependent variable.
If you take OPTION 1, you can use either Multiple Regression/Correlation
or Logistic Regression, BUT to do so all your Independent variables must
also be Interval or Dichotomies (i.e., Nominal and Ordinal Independent vari-
ables must be dichotomized or represented as sets of "dummy variables").
For more info about Multiple Regression/Correlation, return to WATSTAT's
Choice Boxes and choose "Interval" measurement for both the Dependent vari-
able (Box 5) and the Independent (Box 6) variable. For more information on
Logistic Regression, specify "Dichotomy" (Box 5) and "Interval" (Box 6).
With OPTION 2, you can use either Discriminant Analysis or Log-Linear
Analysis. To use Discriminant Analysis, all Independent variables must be
Interval (i.e., Nominal & Ordinal Independent variables must be dichotomized
or represented as sets of "dummy variables"). With Log-Linear Analysis, all
Independent variables must be Nominal (i.e., Ordinal & Interval variables
must be represented as sets of 2 or more Nominal categories). For more info
about Discriminant Analysis, return to WATSTAT's Choice Boxes and specify
"Nominal" for the Dependent (Box 5) and "Interval" for the Independent
variables. For more info about Log-Linear Analysis, specify "Nominal" for
both Dependent (Box 5) and Independent (Box 6) variables.
*D-MUL-1DEP-INT/INT,A
If your Dependent variable is Interval and all your Independent variables
are also Interval (or dichotomies) your best choice is Multiple Regression/
Correlation. Use the Multiple Correlation statistics (R and R-Squared) to
index the strength of the relation between the Dependent variable and all
the Independent variables jointly. Use the Regression Coefficients (b)
to index the effect of each Independent variable and use the Standard Error
of Estimate to index the precision with which the set of Independent vari-
ables predict (estimate) scores on the Dependent variable.
*D-MUL-1DEP-INT/OTHER,A
There is no multivariate procedure designed to relate an Interval depend-
ent variable with Nominal or Ordinal Independent variables. However, after
some simple transformations, you can treat Nominal and Ordinal variables as
if they were Interval and use Multiple Correlation/Regression procedures.
Dichotomous Independent variables (scored 1/0) can be treated as Interval
in these procedures and you can dichotomize whenever it makes sense to treat
a Nominal variable as "present" vs. "absent" (1 vs. 0) or an Ordinal vari-
able as "high" vs. "low" (1 vs. 0). However, it is often desirable to pre-
serve a more detailed representation of Nominal & Ordinal variables: this
can be done by dividing them into categories and using a SET of dichotomous
variables, called "dummy variables," to represent the categories.
Use the Multiple Correlation statistics (R and R-Squared) to index the
strength of the relation between the Dependent variable and all the indepen-
dent variables operating jointly. Use the Regression Coefficients (b-values)
to index the effect of each Independent variable and use the Standard Error
of Estimate to index the precision with which the set of Independent vari-
ables predicts (estimates) scores on the Dependent variable.
*D-MUL-1DEP-DICH/NOM,A
Log-Linear Analysis is specifically designed to assess association
between a Nominal Dependent variable and a set of Nominal Independent vari-
ables. The fact that your Dependent variable is dichotomous presents no
problems, as long as it makes sense to treat it as a Nominal variable.
*D-MUL-1DEP-DICH/ORD,A
There is no procedure designed to measure association between a dichoto-
mous Dependent variable and Ordinal Independent variables. Your best alter-
native is to categorize the Ordinal variables and treat them as Nominal;
then you can use Log-Linear Analysis. For more information about Log-Linear
Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement
for both Dependent (Box 5) and Independent (Box 6) variables.
*D-MUL-1DEP-DICH/INT,A
Several multivariate procedures are potentially applicable if the depen-
dent variable is a dichotomy and all the Independent variables are Interval.
In order of preference, the available options include: Logistic Regression,
Discriminant Analysis, & Multiple Correlation/Regression. Logistic Regress-
ion is almost certain to be applicable. Discriminant Analysis is a good
alternative when category frequencies on the Dependent variable approach a
50%/50% split, but should not be used when the split is more extreme than
80%/20%. Multiple Correlation/Regression is less generally applicable when
the Dependent variable is a dichotomy: although the Dependent variable is
scored 0 and 1 (for "low" & "high") some predicted (Y') scores may attain
impossible values (less than 0 or greater than 1). If there are many impos-
sible values, or if such values will cause problems in interpreting your
results, Multiple (Linear) Correlation/Regression should NOT be used.
*D-MUL-1DEP-DICH/MIXON,A
There is no procedure designed to measure association between a dichoto-
mous Dependent variable and "mixed" Ordinal/Nominal Independent variables.
Your best alternative is to categorize the Ordinal variables and treat them
as Nominal; then you can use Log-Linear Analysis, which assumes that all the
Independent variables are Nominal. For more info about Log-Linear Analysis,
return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both
Dependent (Box 5) and Independent (Box 6) variables.
*D-MUL-1DEP-DICH/MIXIO,A
There is no procedure designed to measure association between a dichoto-
mous Dependent variable and Independent variables with "mixed" measurement
levels, so you'll need to transform one or more Independent variables to
make them ALL either Nominal or Interval. In the former case, you'd simply
break any Interval or Ordinal variables into categories and proceed as if
they were Nominal. In the latter, you'd transform each Ordinal or Nominal
Independent variable to Interval by either: 1) dichotomizing it and assign-
ing scores of "0" and "1" to its categories; or 2) breaking it into catego-
ries and creating a set of "dummy variables" (each scored 0/1) to represent
the categories.
If all Independent variables can be treated as Nominal, you can use
Log-Linear Analysis. For more info about Log-Linear Analysis, return to
WATSTAT's Choice Boxes and specify "Nominal" measurement for both Dependent
(Box 5) and Independent (Box 6) variables.
If all Independent variables are Interval (including dichotomies and
dummy variables), you can use Logistic Regression or Discriminant Analysis.
For more info about these procedures, return to WATSTAT's Choice Boxes and
specify "Dichotomy" for the Dependent (Box 5) variable and "Interval" for
the Independent (Box 6) variables.
*D-MUL-2DEP-INT/INT,A
Several multivariate procedures are potentially applicable when all your
variables are Interval and you're dealing with 2 or more Dependent variables
simultaneously. They include: Canonical Correlation; measures of association
derived from MANOVA; and various Structural Equation Modelling procedures,
e.g., LISREL and EQS. All these assume advanced statistical training and
must be performed by computer. Moreover, so much additional information is
needed to choose from these alternatives that WATSTAT cannot recommend a
"best" procedure here.
*D-MUL-2DEP-INT/NOTINT,A
Several multivariate procedures are potentially applicable when you're
dealing with 2 or more Dependent variables simultaneously. They include:
Canonical Correlation, measures of association derived from MANOVA, and
various procedures for Structural Equation Modelling (e.g., LISREL and EQS).
However, all require advanced statistical training and must be performed by
computer. Further, all assume Interval measurement for ALL variables, so
you won't be able to use them unless you drop "lower-level" variables or
transform them to sets of dummy variables. Finally, so much additional
information is needed to choose from these alternatives that WATSTAT can't
recommend a "best" procedure here.
*D-MUL-2DEP-NOTINT,A
Several multivariate procedures are potentially applicable when you're
dealing with 2 or more Dependent variables simultaneously. They include:
Canonical Correlation, measures of association derived from MANOVA, and
various procedures for Structural Equation Modelling (e.g., LISREL and EQS).
However, all require advanced statistical training and must be performed by
computer. Further, all assume Interval measurement for ALL variables in the
analysis, so you probably won't be able to use them. Finally, so much addi-
tional information is needed to choose from these alternatives that WATSTAT
can't recommend a "best" procedure here.
*D-MUL-NODEP-INT,A
Factor Analysis is recommended for assessing relationships among several
Interval-level variables when there is no Dependent variable identified.
[Dichotomous variables, scored 0/1, may also be Factor Analyzed.]
There are many types of Factor Analysis and selecting the appropriate
type is too complicated for WATSTAT to handle: you'll need to consult a
specialized text on Factor Analysis. Computations require a computer, and
most popular statistical packages offer a variety of Factor Analysis proce-
dures. [The manuals for some of these packages are good sources of advice
on which type of Factor Analysis to apply.]
*D-MUL-NODEP-RANK,A
Kendall's Coefficient of Concordance (Kendall's W) is designed to assess
relationships among 3 or more Ordinal variables when there is no Dependent
variable identified. All variables must be transformed to RANKS if they are
not inherently in rank form. The interpretation of Kendall's W is facili-
tated by its linear relationship to "Average Rho," i.e., the mean rank-order
correlation (Spearman' Rho) between all possible pairs of variables.
*D-MUL-NODEP-NOTINT,A
Factor Analysis is the only widely-used procedure designed to assess
relationships among several variables when there is no Dependent variable
identified. Unfortunately, this procedure assumes that all variables are
Interval, so you can't use it for your "lower level" variables. However,
dichotomies (scored 0/1) may be treated as Interval here, so if you can
dichotomize your "lower level" variables, you can apply Factor Analysis.
*S-UNI-NOM,A
Assuming only Nominal Measurement, the Chi-Square Goodness-of-Fit Test
may be used to test whether it's likely that your RANDOM SAMPLE came from a
POPULATION with an hypothesized proportion of cases in its various catego-
ries. You specify the Population proportions (P) in the Null Hypothesis and
multiply each P by Sample Size to obtain EXPECTED FREQUENCIES for the test.
Within limits, you may specify any set of P's derived from theory or prior
knowledge of a relevant population.
If your variable is Dichotomous, the Binomial Test is preferable to the
Chi-Square Goodness-of-Fit, especially when sample size is small. Use Exact
Binomial Tables for small sample sizes and the Normal Approximation (z-Test)
for larger (>25) samples.
*S-UNI-RANK,A
In the special situation where "scores" or Ranks represent a SEQUENCE of
cases, the so-called "Test for Runs Up and Down" can be used to test for a
TREND, i.e., a tendency for scores to increase or decrease over a sequence.
If data are NOT SEQUENCED and NOT RANKED, your best alternative is to
categorize the data and to apply a test designed for "Partially Ordered"
data (One-Sample Kolmogorov-Smirnov Test) or Nominal data (Chi-Square
Goodness-of-Fit Test). There is no Univariate test for UNSEQUENCED RANKS.
*S-UNI-PART,A
The Kolmogorov-Smirnov One-Sample Test is recommended for a Categorized
Ordinal ("Partially Ordered") variable. It tests the Null Hypothesis that
the random sample was drawn from a Population with some specified Proportion
of cases in the various categories: you specify these Proportions based on
theory or prior information about the Population.
*S-UNI-INT,A
Use the One-Sample t-Test to determine whether it is likely that your
sample was DRAWN FROM A POPULATION WITH A KNOWN (or guessed) MEAN, which
you specify in the Null Hypothesis. Besides requiring INTERVAL MEASUREMENT,
valid application of this test assumes the sample was drawn from a NORMALLY
DISTRIBUTED POPULATION. Check to see that your data adequately meet these
assumptions: most intro. texts explain conditions under which they may be
relaxed.
If you're interested in estimating the MEAN of the POPULATION from which
your RANDOM SAMPLE was drawn, compute CONFIDENCE LIMITS FOR THE MEAN.
If you're interested in the SHAPE of your variable's distribution, use
the Chi-Square Goodness-of-Fit Test to see if it's likely that your SAMPLE
was drawn from a POPULATION with an hypothesized proportion of cases in its
various categories. You specify the Population Proportions (P) in the NULL
Hypothesis and multiply each P by Sample N to get EXPECTED FREQUENCIES for
the test. Within limits, you may hypothesize any set of P's derived from
theory or prior knowledge of a population. If you get the P's from a table
of the Normal Distribution, you can use the Chi-Square Goodness-of-Fit Test
to see whether it's likely that your sample came from a NORMALLY DISTRIBUTED
POPULATION.
*S-2SAMPLE-INT,A
Use Student's t-Test to compare TWO SUB-SAMPLE MEANS on an INTERVAL
DEPENDENT VARIABLE, where RANDOM SAMPLING or RANDOM ASSIGNMENT of cases has
yielded INDEPENDENT SUB-SAMPLES. Valid application of this test assumes:
1) that sub-samples were drawn from two NORMALLY DISTRIBUTED POPULATIONS, &
2) that the two parent POPULATIONS have EQUAL VARIANCES. Check to see that
your data approximate these assumptions: most intro. texts list conditions
under which these assumptions may be relaxed. A special form of the t-test
is available in cases where population variances are unequal.
*S-2MATCH-INT,A
Use the Matched-Pairs t-Test to compare TWO SUB-SAMPLE MEANS on an
INTERVAL DEPENDENT VARIABLE, where RANDOM SAMPLING or RANDOM ASSIGNMENT has
yielded MATCHED (dependent) SUB-SAMPLES. Valid application of this test
assumes that sub-samples were drawn from 2 NORMALLY DISTRIBUTED POPULATIONS.
Check to see that your data approximate this assumption: most intro. texts
list conditions under which it may be relaxed.
*ARCSINE,A
A number of tests are available for comparing 2 dichotomous sub-samples,
in cases where RANDOM SAMPLING OR RANDOM ASSIGNMENT has yielded INDEPENDENT
SUB-SAMPLES. (They are listed in order of preference.) The Arcsine Test is
the preferred alternative, especially if sample size is small. A Chi-Square
Contingency Test, with data cast in a 2-by-2 table, gives similar results
when sample size is large. For smaller samples, Fisher's Exact may be used.
Special forms of the z-test and t-test, which test for DIFFERENCES IN PRO-
PORTIONS, are also applicable. Consult a statistics text for the assump-
tions underlying each of these tests.
*FISHER-EXACT,A
Fisher's Exact Test is usually the best alternative for detecting a
difference between INDEPENDENT SUB-SAMPLES when sample size is very small
and data can be cast in a 2-by-2 contingency table. Fisher's Exact Test is
also used as an alternative to the Chi-Square Contingency Test when sample
size is too small to apply the latter: in such cases it is used to test for
the significance of an ASSOCIATION BETWEEN 2 DICHOTOMOUS NOMINAL VARIABLES.
Although not widely-known, Fisher's Exact Test can be extended to tables
larger than a 2-by-2: the only problem is finding a computer program that
calculates p-values for larger tables.
*MCNEMAR,A
The McNemar Test is designed to compare a DICHOTOMOUS DEPENDENT VARIABLE
across 2 MATCHED SUB-SAMPLES. The Dependent variable may be inherently
dichotomous or transformed to a dichotomy especially for the test. There is
NO TEST designed to compare a Dependent variable with 3 or more categories
across Matched Sub-Samples.
The McNemar Test assumes only Nominal Measurement, but if an Ordinal
Dependent variable is dichotomized at the Overall Median, it can be used as
a test for differences between Medians for MATCHED SAMPLES.
*MEDIAN-TEST,A
The Median Test is designed to compare 2 INDEPENDENT SUB-SAMPLES when
the DEPENDENT VARIABLE is ORDINAL and when it is feasible to determine the
OVERALL MEDIAN OF THE TOTAL SAMPLE. Although tests based on ranks are
preferable, the Median Test is a good alternative when data are "Partially
Ordered" or when sample size so large that it is infeasible to rank the data.
The Median Test is really a "transformation" rather than a distinct test:
data are cast in a 2-by-2 contingency table by breaking the Dependent vari-
able at the overall Median; then either the Chi-Square Contingency Test or
Fisher's Exact Test is applied, depending on sample size.
The Median Test can also be applied when there are 3 or More INDEPENDENT
SUB-SAMPLES. In this case, the Dependent variable is again Dichotomized at
the OVERALL MEDIAN, but data are cast in a 2-by-k contingency table, where
k is the number of sub-samples. Then the Chi-Square Contingency Test is
applied.
*WILCOX-MATCH,A
The appropriate test for a difference between TWO MATCHED SUB-SAMPLES,
when the ORDINAL DEPENDENT VARIABLE is scored a RANKS, is the Wilcoxon
Matched-Pairs Test [sometimes called the Matched-Pairs Signed-Ranks Test].
*WILCOX-RSUM,A
Two tests, the Wilcoxon Rank-Sum Test and the Mann-Whitney U-Test, can
be applied to test for a difference between TWO INDEPENDENT SUB-SAMPLES,
when the ORDINAL DEPENDENT VARIABLE is scored as RANKS. These are really
two forms of the same test and yield exactly the same p-values. Although
the Mann-Whitney is more widely used, the Wilcoxon Rank-Sum Test is much
easier to compute and interpret and, therefore, preferable. [Don't confuse
this Rank-Sum Test with Wilcoxon's Matched-Pairs Test, which is used for
DEPENDENT SUB-SAMPLES.]
*ONEWAY,A
The appropriate significance test for differences between Means of three
or more INDEPENDENT SUB-SAMPLES is the so-called "ONE-WAY ANOVA F-TEST."
This is an "overall" test: it detects differences between pairs or combina-
tions of sub-samples, but it can't specify which sub-samples differ. Thus,
it must be followed by more specific tests, called CONTRASTS, to pinpoint
which sub-samples differ. Besides assuming INDEPENDENT SUB-SAMPLES and
INTERVAL MEASUREMENT, this F-Test assumes that sub-samples were drawn from
NORMALLY DISTRIBUTED POPULATIONS that have EQUAL VARIANCES. Check to see
that your data approximate all these assumptions: most intro. texts specify
conditions under which they may be relaxed. Consult a specialized text on
Analysis of Variance (ANOVA) for help in selecting a test for CONTRASTS
following the overall F-Test. [Usually, the Duncan Multiple-Range Test is
best for Contrasts between PAIRS of sub-samples and the Scheffe Test best
for Contrasts between GROUPS of sub-samples, but there are many other alter-
natives that may be preferable in your case.]
*TWOWAY,A
The best significance test for differences between Means of 3 or more
MATCHED SUB-SAMPLES is ANALYSIS OF VARIANCE F-TEST FOR RANDOMIZED BLOCKS,
which is sometimes loosely called "TWO-WAY" ANOVA. In this design, "Blocks"
may be individual cases or sets of matched cases, which are represented in
all the sub-samples. Blocks are used to "control" extraneous between-case
variation. When individual cases appear in all the sub-samples, the design
is referred to as a RANDOMIZED BLOCKS DESIGN WITH REPEATED MEASURES.
The F-Test is an "overall" test: it detects differences between pairs or
combinations of sub-samples, but it can't specify which sub-samples differ.
Thus, it must be followed by more specific tests, called CONTRASTS, to pin-
point which sub-samples differ. Besides assuming INTERVAL MEASUREMENT, this
F-Test assumes that sub-samples were drawn from NORMALLY DISTRIBUTED POPULA-
TIONS that have EQUAL VARIANCES. Check to see that your data approximate
all these assumptions. Specialized texts on Analysis of Variance (ANOVA)
usually contain extensive explanations of underlying assumptions and also
offer help in selecting a test for CONTRASTS following the overall F-Test.
*CR-FACTORIAL,A
ANALYSIS OF VARIANCE with a COMPLETELY RANDOMIZED FACTORIAL (CRF) design
is the best alternative when you have: an 1) INTERVAL DEPENDENT VARIABLE,
2) TWO OR MORE COMPARISON VARIABLES, and 3) NO MATCHING of cases across
sub-samples of any Comparison Variable. [The last condition implies that
each case appears in the analysis one and only one time.]
The CRF design yields an F-Test for each Comparison Variable and also
for INTERACTION EFFECTS due to sets of these variables. The F-Tests are
"overall" tests: they detect differences between pairs or combinations of
sub-samples, but don't specify which sub-samples differ. Thus, they must
be followed by more specific tests, called CONTRASTS, to pinpoint which
sub-samples differ. Besides INTERVAL MEASUREMENT, the F-Tests assume that
the sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have
EQUAL VARIANCES. Check to see that your data approximate all these assump-
tions. Specialized texts on Analysis of Variance usually contain extensive
explanations of underlying assumptions and the conditions under which they
may be relaxed. Only a few offer help in selecting the most appropriate
test for CONTRASTS in CRF Designs.
*RB-FACTORIAL,A
ANALYSIS OF VARIANCE with a RANDOMIZED BLOCKS FACTORIAL (RBF) design is
the best alternative if you have: an 1) INTERVAL DEPENDENT VARIABLE, 2) TWO
OR MORE COMPARISON VARIABLES, and 3) MATCHED CASES or OBSERVATIONS across
sub-samples of one or more Comparison Variables. In this design, "Blocks"
may be individual cases or sets of matched cases, which are represented in
all the sub-samples of a Comparison Variable. Blocks are used to "control"
extraneous between-case variation. When individual cases appear in all the
sub-samples of any Comparison Variable, the design is referred to as a
RANDOMIZED BLOCKS FACTORIAL DESIGN WITH REPEATED MEASURES. When the Blocks
are split into "Sub-Blocks" on one or more "Blocking Variables" the design
is referred to as a SPLIT-PLOT DESIGN.
The RBF design yields an F-Test for each Comparison Variable and also
for INTERACTION EFFECTS due to sets of these variables. The F-Tests are
"overall" tests: they detect differences between pairs or combinations of
sub-samples, but don't specify which sub-samples differ. Thus, they must
be followed by more specific tests, called CONTRASTS, to pinpoint which of
the sub-samples differ. Besides INTERVAL MEASUREMENT, the F-Tests assume
that sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have
EQUAL VARIANCES. Check to see that your data approximate all these assump-
tions. Specialized texts on Analysis of Variance usually contain extensive
explanations of underlying assumptions and the conditions under which they
may be relaxed. Only a few offer help in selecting the most appropriate
test for CONTRASTS in RBF or Split-Plot Designs.
*ANOVA/REGN,A
[Traditional ANOVA computations for the above design require EQUAL FREQUEN-
CIES in all the cells created when the sample is split by 2 or more Compar-
ison Variables. If cell frequencies are unequal, F-Ratios can be obtained
through Multiple Regression procedures, of which ANOVA is a special case.
Most computer programs use Multiple Regression for all ANOVA problems, but
hide this fact by reporting results in a conventional ANOVA Summary Table.]
*ANCOVA,A
If you have one or more Independent variables that you wish to "control"
or "adjust for" without building them in as Comparison Variables, you can
apply ANALYSIS OF COVARIANCE (ANCOVA) procedures. ANCOVA is an extension of
ANOVA in which the effects of one or more INTERVAL-LEVEL INDEPENDENT VARI-
ABLES are "partialled out," through Multiple Regression procedures, before
F-Ratios are computed for the major Comparison Variables. Normally, vari-
ables are selected for such adjustment because they create "extraneous"
variation in the Dependent Variable and can't be eliminated physically.
ANCOVA usually requires a computer and most popular statistical packages
can perform it. To use ANCOVA, you must meet all the assumptions of ANOVA
and Multiple Regression, plus some additional ones unique to this procedure.
Specialized texts on Analysis of Variance usually explain all these assump-
tions and the conditions under which they may be relaxed.
*MANOVA,A
MULTIVARIATE ANALYSIS OF VARIANCE (MANOVA) is an extension of ANOVA
designed to handle two or more INTERVAL-LEVEL DEPENDENT VARIABLES simulta-
neously. The application of MANOVA and the interpretation of its results
requires advanced statistical training. If you lack such expertise, and if
your theory demands MANOVA, it would be wise to seek help from a statistical
consultant before attempting to apply it. It may be wiser yet to choose a
procedure that can be applied in separate analyses for each Dependent vari-
able. If the latter alternative is feasible, WATSTAT may be able to offer
more help: return to the Choice Boxes and select "Multivariate with ONE
Dependent Variable" in Box 4.
*CHI-LOGIST,A
Significance tests associated with Logistic Regression PARALLEL those
used with Linear Multiple Regression: there are tests for overall fit of
the equation as well as for individual Regression Coefficients. However,
as Logistic Regression is based on a different equation-fitting criterion,
neither the tests nor their interpretations are IDENTICAL to their Linear
counterparts. Logistic Regression also has its own set of assumptions and
limitations, which you'll need to consider.
*CHI-COMP-NOM,A
Use the Chi-Square Contingency Test to determine whether it is likely
that your RANDOM SAMPLE was drawn from a set of Sub-Populations (correspond-
ing to your Sub-Samples) that have the same proportion of cases in the
various categories of the Dependent Variable. [Chi-Square must be computed
on RAW FREQUENCIES: don't make the common beginner's error of computing it
from a table of Percentages or Proportions.]
*CHI-PHI,A
The appropriate significance test for the Phi Coefficient or Cramer's V
is the Chi-square Contingency Test. Fisher's Exact Test may be used as a
test for Phi if sample size is too small for the Chi-Square Test.
*TTEST-BIV-R,A
A special t-Test or F-Test is used to test for the significance of the
Correlation Coefficient (r) or the Regression Coefficient (b). In the bi-
variate case, t and F Tests yield exactly the same p-values and tests for
r and b are equivalent. Besides requiring INTERVAL MEASUREMENT, these tests
assume BIVARIATE NORMALITY. Check to see that your data approximate this
assumption: most intro. texts list conditions under which it may be relaxed.
*TTEST-RHO,A
A special t-Test is used to test for the significance of Spearman's Rho.
The computing formula for this test is the same as that used for the Linear
Correlation Coefficient (r) except that Rho replaces r in the computations.
*ZTEST-TAU,A
The significance test for Kendall's Tau uses a z-statistic, which is
referred to a table of the Standard Normal Distribution to obtain p-values.
For sample sizes less than 10, exact tables are available and should be used
instead of the Normal approximation.
*FTEST-ETA,A
The significance test used for the Correlation Ratio (Eta) is the F-Test
obtained from a ONE-WAY ANALYSIS OF VARIANCE.
*FTEST-MULTR,A
An F-Test is used to test for the significance of the Multiple Correla-
tion Coefficient. A special t-Test or F-Test (yielding identical p-values)
is used to test the significance of each Regression Coefficient in the equa-
tion. F-Tests for "R-Square Change" can be used to test whether a set of
two or more Independent Variables contributes significantly to the fit of
equation. Valid application of these tests rests on many stringent assump-
tions: consult a Multiple Regression/Correlation text for information about
these assumptions and check to see that your data meet them.
*S-LOG-LIN,A
Several significance tests are usually applied in a Log-Linear Analysis,
all of which are referred to the Chi-Square Distribution to obtain p-values.
In addition to a test for overall fit of a Log-Linear Model (analogous to a
test for R-Squared in Regression), tests are usually made for MAIN EFFECTS
and INTERACTION EFFECTS (analogous to F-Tests in Analysis of Variance).
*S-DISCRIM,A
Several F-Tests are usually applied in a Discriminant Analysis, includ-
ing: a test for fit of each discriminant function, tests for the contribu-
tion of each Discriminant Function Coefficient, and tests for differences
between groups. Computer programs also use significance tests as criteria
for including variables and for terminating the analysis. [The validity of
these criteria, like ALL significance tests, rests on the assumption of
Random Sampling.]
*S-FACTOR-ANAL,A
Numerous tests can be applied in Factor Analysis, including tests for
Factor Loadings, Correlations between Factors, and the Number of Factors.
When the focus is on description, as it is in so-called "Exploratory Factor
Analysis," there is usually no need for any tests. However, significance
tests become central when the Factor Analysis is used to address theoretical
hypotheses, as in "Confirmatory Factor Analysis."
*S-KENDALL-W,A
The significance test for Kendall's W uses exact tables when sample
size and the number of variables are small. Otherwise, a Chi-Square stat-
istic is used. The Null Hypothesis tested is that the sample was drawn
from a population in which the variables are mutually Independent.
*S-COCHRANQ,A
Cochran's Q Test is designed to compare a DICHOTOMOUS DEPENDENT VARIABLE
across 3 or more MATCHED SUB-SAMPLES. The Dependent variable may be inher-
ently dichotomous or transformed to a dichotomy especially for the Q-test.
There is NO TEST designed to compare a Dependent variable with 3 or more
categories across Matched Sub-Samples.
Cochran's Q Test assumes only Nominal Measurement, but if an Ordinal
Dependent variable is dichotomized at the OVERALL MEDIAN, it can be used to
test the Null Hypothesis that Matched Sub-Samples were RANDOMLY drawn from
Populations with the same Median.
*KRUSKAL,A
The Kruskal-Wallis Test is designed to compare an ORDINAL DEPENDENT
VARIABLE across 3 or more INDEPENDENT SUB-SAMPLES. If the Dependent vari-
able is not inherently Ranked it must be transformed to Ranks for the test.
The Kruskal-Wallis is an analogue of One-Way ANOVA and uses a Chi-Square
test statistic in place of the ANOVA F-Test.
*FRIEDMAN,A
The Friedman Test is designed to compare an ORDINAL DEPENDENT VARIABLE
across 3 or more MATCHED SUB-SAMPLES. If the Dependent variable is not
inherently Ranked it must be transformed to Ranks for the test. This test
is an analogue of "Two-Way ANOVA" (Randomized Blocks ANOVA) and uses a
Chi-Square test statistic in place of the ANOVA F-Test.
*S-COMP2-RANK,A
There is no well-known significance test for Ordinal data that can
handle 2 or more Independent (Comparison) Variables in a single analysis.
That is, there are no Ordinal-Level analogues to Factorial ANOVA, Analysis
of Covariance, etc., which are used with Interval Dependent Variables.
*S-COMP2-DICH,A
There is no test designed to compare a DICHOTOMOUS DEPENDENT VARIABLE
across SUB-SAMPLES created by 2 or more Independent (Comparison) variables.
However, if it's appropriate to shift the Analytical Focus from "Sub-Sample
Comparison" to "Association," a number of alternatives are open. Among
these are Logistic Regression and Discriminant Analysis. If your Analytical
Focus can be changed in this way -- if it MAKES SENSE to cast your research
questions in terms of Association -- return to WATSTAT's Choice Boxes and
select "No Sub-Sample Comparisons" in Box 2 and "Describe Association" in
Box 3. WATSTAT's Report will then give you more information about Logistic
Regression and Discriminant Analysis.
*S-COMP2-NOM-IND,A
There is no test designed to compare a NOMINAL DEPENDENT VARIABLE across
SUB-SAMPLES created by 2 or more Independent (Comparison) variables.
If it's appropriate to change your Analytical Focus from "Sub-Sample
Comparison" to "Association," a number of alternatives are open, namely,
Log-Linear Analysis, Logistic Regression, and Discriminant Analysis. If it
MAKES SENSE to re-cast your research questions in terms of Association,
return to WATSTAT's Choice Boxes and select "No Sub-Sample Comparisons" in
Box 2 and "Describe Association" in Box 3. WATSTAT's Report will then give
you more information about the above alternatives. [All these alternatives
require advanced statistical training: a wise novice will seek expert help.]
*S-COMP2-NOM-MATCH,A
There is NO MULTIVARIATE TEST designed to compare a NOMINAL DEPENDENT
VARIABLE across MATCHED SUB-SAMPLES created by 2 or more Comparison vari-
ables. If you haven't yet collected the data, consider ways to achieve an
Interval-Level measure of the Dependent variable. If the data are already
collected, and if it's appropriate and feasible to dichotomize the Dependent
variable, you may be able to use ANOVA F-Tests. [This will also require a
so-called ARCSINE TRANSFORMATION before ANOVA can be applied to a Dichotomous
Dependent variable.] If either of these options is viable in your case,
return to WATSTAT's Choice Boxes and select "Interval" in Box 5.
*COPYRIGHT,A
COPYRIGHT 1991 BY HAWKEYE SOFTWORKS, 300 GOLFVIEW AVE., IOWA CITY, IA, 52246